Module 04
What can be included in the aes() function?
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>)
)
Variables (quantitative or categorical) can be mapped to the following aesthetics:
What’s happening here?
The pair plot above is an example of a facet plot. We can create a facet plot by adding the facet_wrap() or facet_grid() function to the ggplot object.
facet_grid() (1/3)facet_grid() (1/3)facet_grid() (3/3)~ operatorUsing the ~ operator to specify the rows and columns of the facet grid.
Can you explain the use of “.” in R? Specifically, as it’s used in ggplot2 for functions like facet_grid(. ~ species)
The dot (.) in R, especially as used in ggplot2 functions like facet_grid(), has a special meaning. Let me explain its use:
In ggplot2’s facet_grid():
In the context of facet_grid(. ~ species), the dot serves as a placeholder. It indicates that you don’t want to split the plot on the rows (left side of the tilde ~), but you do want to split it on the columns (right side of the tilde) based on the “species” variable.
facet_grid(. ~ species): Creates separate plots for each unique value in the “species” column, arranged horizontally.facet_grid(species ~ .): Would create separate plots for each unique value in the “species” column, arranged vertically.facet_grid(var1 ~ var2): Would create a grid of plots, with “var1” determining the rows and “var2” the columns.In other R contexts:
The dot can also be used in other situations in R:
%>% or |>), the dot represents the object being passed through the pipe.Chapter 9 9.4.1 Exercises. Questions 1 through 7
Chapter 9 9.3.1 Exercises, questions 1 through 4
Several geometric objects in ggplot2 carry out statistical transformations, examples include:
geom_histogram() calculates the count of observations in bins.geom_density() calculates the density of a variable.geom_bar() calculates the count of observations for each level of a categorical variable.geom_boxplot() calculates the median, quartiles, and outliers of a variable.geom_smooth() calculates a smoothed line (or fit) through the data.Figure 9.2 from R4DS 2e provides the following graphic:
For every geometric object geom_*, there is a corresponding statistical transformation stat_*.
Additional transformations can be called to override the default values. For instance, the geom_histogram() includes the following:
These are calculated by the ‘stat’ part of layers and can be accessed with delayed evaluation.
after_stat(count)
number of points in bin.
after_stat(density)
density of points in bin, scaled to integrate to 1.
after_stat(ncount)
count, scaled to a maximum of 1.
after_stat(ndensity)
density, scaled to a maximum of 1.
after_stat(width)
widths of bins.
Location AND Variation
machine |>
group_by(machine, time) |>
summarise(mean_diameter = mean(diameter),
sd_diameter = sd(diameter),
ci_diameter = 1.96 * sd(diameter) / sqrt(n())) |>
ungroup() |>
mutate(machine_time = str_c(machine, time, sep = "_")) |>
ggplot() +
geom_point(aes(x = machine_time, y = mean_diameter, color = machine), size = 4) +
geom_errorbar(aes(x = machine_time,
ymin = mean_diameter - ci_diameter,
ymax = mean_diameter + ci_diameter), width = 0.25)Location AND Variation
Location AND Variation
For most plots, the labs() layer will provide the right amount of customization.
penguins |>
ggplot() +
geom_point(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
labs(title = "Bill Length and Depth Serve as Natural Classifiers",
subtitle = "palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data",
x = "Bill Length (mm)",
y = "Bill Depth (mm)",
color = "Species",
caption = "https://github.com/allisonhorst/palmerpenguins")geom_text()
annotate()
geom_text() (1/2)geom_text() (2/2)library(ggrepel)
penguins |>
ggplot() +
geom_point(aes(x = bill_length_mm, y = bill_depth_mm, color = species), alpha = 1/4, show.legend = FALSE) +
geom_point(data = mean_bill_depth_length,
aes(x = mean_bill_length, y = mean_bill_depth, color = species),
size = 4, show.legend = FALSE) +
geom_label_repel(data = mean_bill_depth_length,
aes(x = mean_bill_length, y = mean_bill_depth, label = species, color = species),
size = 4, fontface = "bold", box.padding = 1, show.legend = FALSE, seed = 42) +
labs(
title = "Comparison of Means",
subtitle = "Bill Length and Depth by Species",
x = "Bill Length (mm)",
y = "Bill Depth (mm)",
caption = "palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data"
) +
theme_classic()annotate()furnace |>
ggplot() +
geom_point(aes(x = seq_along(thickness), y = thickness)) +
geom_hline(yintercept = 560, linetype = "dashed") +
geom_hline(yintercept = c(460, 660), linetype = "dashed", color = "red") +
annotate("text", x = 175, y = 560-50, label = "Target Thickness", color = "black", size = 4, fontface = "bold") +
annotate("segment", x = 175, xend = 190, y = 520, yend = 560,
arrow = arrow(ends = "last", length = unit(0.3, "cm")),
size = 0.5, color = "black") +
annotate("text", x = 20, y = 460+10, label = "Lower Control Limit", color = "red", size = 4, fontface = "bold") +
annotate("text", x = 20, y = 660-10, label = "Upper Control Limit", color = "red", size = 4, fontface = "bold") +
labs(
title = "Run Sequence Plot of Furnace Data",
subtitle = "Thickness of processed wafers",
x = "Sample Number",
y = "Thickness (Å)",
caption = "\nhttps://www.itl.nist.gov/div898/handbook/ppc/section5/ppc511.htm"
) +
theme_classic()Source: U.S. Census Bureau
Release: Income and Poverty in the United States
Units: 2022 CPI-U-RS Adjusted Dollars, Not Seasonally Adjusted
Frequency: Annual
Household data are collected as of March.
. . . the Consumer Price Index retroactive series using current methods (R-CPI-U-RS) presents an estimate of the CPI for all Urban Consumers (CPI-U) from 1978 to the present that incorporates, when possible, most of the improvements made over that time span into the entire series.
rmhi |>
ggplot() +
geom_line(aes(x = date, y = median_income), color = "blue") +
labs(
title = "Real Median Household Income in the United States",
subtitle = "Units: CPI-U-RS Adjusted Dollars, Not Seasonally Adjusted",
x = "Year",
y = "2022 CPI-U-RS Adjusted Dollars",
caption = "Source: U.S. Census Bureau"
) +
theme_linedraw()date_limits = c(as_date("1980-01-01"), as_date("2025-01-01"))
date_breaks = seq(as_date("1980-01-01"), as_date("2025-01-01"), by = "5 years")
rmhi |>
ggplot() +
geom_line(aes(x = date, y = median_income), color = "blue") +
labs(
title = "Real Median Household Income in the United States",
subtitle = "Units: CPI-U-RS Adjusted Dollars, Not Seasonally Adjusted",
x = "Year",
y = "2022 CPI-U-RS Adjusted Dollars",
caption = "Source: U.S. Census Bureau"
) +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(limits = date_limits, breaks = date_breaks, date_labels = "%Y") +
theme_linedraw() date_limits = c(as_date("1980-01-01"), as_date("2025-01-01"))
date_breaks = seq(as_date("1980-01-01"), as_date("2025-01-01"), by = "5 years")
rmhi |>
ggplot() +
geom_line(aes(x = date, y = median_income), color = "blue") +
labs(
title = "Real Median Household Income in the United States",
subtitle = "Units: CPI-U-RS Adjusted Dollars, Not Seasonally Adjusted",
x = "Year",
y = "2022 CPI-U-RS Adjusted Dollars",
caption = "Source: U.S. Census Bureau"
) +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(limits = date_limits, breaks = date_breaks, date_labels = "%Y") +
theme_linedraw()Look at ?theme
Original visualization:
date_limits = c(as_date("1980-01-01"), as_date("2025-01-01"))
date_breaks = seq(as_date("1980-01-01"), as_date("2025-01-01"), by = "5 years")
rmhi |>
ggplot() +
geom_line(aes(x = date, y = median_income), color = "blue") +
labs(
title = "Real Median Household Income in the United States",
subtitle = "Units: CPI-U-RS Adjusted Dollars, Not Seasonally Adjusted",
x = "Year",
y = "2022 CPI-U-RS Adjusted Dollars",
caption = "Source: U.S. Census Bureau"
) +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(limits = date_limits, breaks = date_breaks, date_labels = "%Y") +
theme_linedraw()Updated visualization:
date_limits = c(as_date("1980-01-01"), as_date("2025-01-01"))
date_breaks = seq(as_date("1980-01-01"), as_date("2025-01-01"), by = "5 years")
rmhi |>
ggplot() +
geom_line(aes(x = date, y = median_income), color = "blue") +
labs(
title = "Real Median Household Income in the United States",
subtitle = "CPI-U-RS Adjusted Dollars (2022), Not Seasonally Adjusted",
x = NULL,
y = NULL,
caption = "Source: U.S. Census Bureau"
) +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(limits = date_limits, breaks = date_breaks, date_labels = "%Y") +
theme_linedraw() +
theme(
plot.title = element_text(size = 20, face = "bold", hjust = 0),
plot.title.position = "plot",
plot.subtitle = element_text(size = 14, face = "italic", hjust = 0),
axis.title.x = element_blank(),
axis.text = element_text(size = 12),
axis.title.y = element_blank(),
plot.caption = element_text(size = 10, hjust = 0, vjust = 0)
)The goal of patchwork is to make it ridiculously simple to combine separate ggplots into the same graphic. As such it tries to solve the same problem as
gridExtra::grid.arrange()andcowplot::plot_gridbut using an API that incites exploration and iteration, and scales to arbitrarily complex layouts.
p1 <- penguins |>
ggplot() +
geom_boxplot(aes(x = species, y = bill_depth_mm, fill = species), show.legend = FALSE) +
labs(tag = "A") + theme(plot.tag.location = "panel", plot.tag.position = "topright")
p2 <- penguins |>
ggplot() +
geom_point(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
labs(tag = "B") +theme(plot.tag.location = "panel", plot.tag.position = "topright")
(p1 + p2) + plot_layout(guides = "collect")Recreate the plot from the article:
library(ggtext)
caption_text = paste(str_wrap("Source: Pew Research Center analysi of a random selection of URLs collected by the Common Crawl web repository (n=999,989) and checked using page and DNS response codes. Web pages defined as inaccessible if they returned a status code of 204, 400, 404, 410, 500, 501, 502, 503, 523 or did not return a valid status code.",
width = 86),
"\n\"When Online Content Disappears\"")
pew_df |>
ggplot() +
geom_line(aes(x = year, y = percent), color = color, linewidth = 3) +
geom_point(aes(x = year, y = percent), shape = 21 , color = color, fill = "white", size = 4, stroke = 1) +
geom_text(aes(x = year, y = percent, label = paste0(percent, "%")), nudge_y = 2.2, nudge_x = 0.0, size = 4, color = color, data = filter(pew_df, year == 2013)) +
geom_text(aes(x = year, y = percent, label = percent), nudge_y = 2.2, nudge_x = 0, size = 4, color = color, data = filter(pew_df, year >= 2014)) +
# annotate("text", x = 2014.05, y = 5, label = "PEW RESEARCH CENTER", color = "black", size = 4, fontface = "bold") +
labs(
title = "38% of webpages from 2013 are no longer accessible",
subtitle = "% of links from each year that are no longer accessible as of October 2023",
x = "",
y = NULL,
caption = caption_text,
tag = "PEW RESEARCH CENTER"
) +
theme_minimal() +
scale_y_continuous(limits = c(5, 42)) +
scale_x_continuous(limits = c(2012.75, 2023.25), breaks = 2013:2023) +
theme(
plot.title = element_markdown(size = 20, face = "bold"),
plot.title.position = "plot",
plot.subtitle = element_markdown(size = 14, face = "italic", color = "#666666"),
plot.caption = element_text(size = 12, face = "italic", hjust = 0, color = "#666666"),
plot.caption.position = "plot",
axis.text.x = element_text(size = 12),
axis.line.x = element_line(color = "black"),
axis.ticks.x = element_line(color = "black"),
axis.text.y = element_blank(),
plot.tag.location = "plot",
plot.tag.position = "bottomright",
plot.tag = element_text(size = 12, face = "bold", color = "black"),
panel.grid = element_blank()
)Applied Statistical Techniques